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ABSTRACT 

The problem of finding a local minimum of a black-box func- 
tion is central for understanding local search as well as quan- 
tum adiabatic algorithms. For functions on the Boolean 

hypercube {0, l} n , we show a lower bound of ^2 n / 4 /nJ 

on the number of queries needed by a quantum computer to 
solve this problem. More surprisingly, our approach, based 
on Ambainis's quantum adversary method, also yields a 
lower bound of Q ^2' l//2 /n 2 ^ on the problem's classical ran- 
domized query complexity. This improves and simplifies a 
1983 result of Aldous. Finally, in both the randomized and 
quantum cases, we give the first nontrivial lower bounds for 
finding local minima on grids of constant dimension d > 3. 

1. INTRODUCTION 

This paper deals with the following problem. 



Local Search. Given an undirected graph G — (V,E) and 
a function f : V — » N, find a local minimum of f — that is, 
a vertex v such that /(«)</ (w) for all neighbors w of v. 

We are interested in the number of queries that an algorithm 
needs to solve this problem, where a query just returns / (v) 
given v. We consider deterministic, randomized, and quan- 
tum algorithms. Section motivates the problem theoreti- 
cally and practically; this section explains our results. 

We start with some simple observations. If G is the com- 
plete graph of size N, then clearly fl (N) queries are needed 

to find a local minimum (or fi (Vn) with a quantum com- 
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puter 9 ). At the other extreme, if G is a line of length N, 
then even a deterministic algorithm can find a local mini- 
mum in O (log N) queries, using binary search: query the 
middle two vertices, v and w. If / (v) < f (w), then search 
the line of length (JV — 2) /2 connected to v; otherwise search 
the line connected to w. Continue recursively in this man- 
ner until a local minimum is found. 

So the interesting case is when G is a graph of 'intermediate' 
connectedness: for example, the Boolean hypercube {0, l} n , 
with two vertices adjacent if and only if they have Hamming 
distance 1. For this graph, Llewellyn, Tovey, and Trick |19| 
showed a fi (2 n /y / n) lower bound on the number of queries 
needed by any deterministic algorithm, using a simple adver- 
sary argument. Intuitively, until the set of vertices queried 
so far comprises a vertex cut (that is, splits the graph into 
two or more connected components), an adversary is free to 
return a descending sequence of /-values: / («i) = 2" for the 
first vertex vi queried by the algorithm, / (V2) — 2" — 1 for 
the second vertex queried, and so on. Moreover, once the 
set of queried vertices does comprise a cut, the adversary 
can choose the largest connected component of unqueried 
vertices, and restrict the problem recursively to that compo- 
nent. So to lower-bound the deterministic query complex- 
ity, it suffices to lower-bound the size of any cut that splits 
the graph into two reasonably large components. 1 For the 
Boolean hypercube, Llewellyn et al. showed that the best 
one can do is essentially to query all Q (2 n /s/n) vertices of 
Hamming weight n/2. 

Llewellyn et al.'s argument fails completely in the case of 
randomized algorithms. By Yao's minimax principle, what 
we want here is a fixed distribution T> over functions / : 
{0, l} n — ► N, such that any deterministic algorithm needs 
many queries to find a local minimum of /, with high prob- 
ability if / is drawn from D. Taking T> to be uniform will 
not do, since a local minimum of a uniform random func- 
tion is easily found. However, Aldous 3 had the idea of 
defining T> via a random walk, as follows. Choose a vertex 
vq 6 {0, 1}" uniformly at random; then perform an unbi- 
ased walk 2 «o, vi, V2, . . ■ starting from vo- For each vertex 
v, set / (v) equal to the first hitting time of the walk at v — 
that is, f (v) = min {t : v± = v}. Clearly any / produced 
in this way has a unique local minimum at no, since for all 



1 Llewellyn et al. actually give a tight characterization of 
deterministic query complexity in terms of vertex cuts. 
2 Actually, Aldous used a continuous-time random walk, so 
the functions would be from {0, 1}" to R. 



t > 0, if vertex vt is visited for the first time at step t then 
/ (vt) > f (vt-i). Using sophisticated random walk analy- 
sis, Aldous managed to show a lower bound of 2 n / 2_ °(™' on 
the expected number of queries needed by any randomized 
algorithm to find vo. s (As we will see in Section [3] this 
lower bound is close to tight.) Intuitively, since a random 
walk on the hypercube mixes in O (n log n) steps, an algo- 
rithm that has not queried a v with / («) < 2 n//2 has almost 
no useful information about where the unique minimum vq 
is, so its next query will just be a "stab in the dark." 

Aldous's result leaves several questions about LOCAL Search 
unanswered. What if the graph G is a 3-D cube, on which a 
random walk does not mix very rapidly? Can we still lower- 
bound the randomized query complexity of finding a local 
minimum? More generally, what parameters of G make the 
problem hard or easy? Also, what is the quantum query 
complexity of Local Search? 

This paper presents a new approach to Local Search, 
which we believe points the way to a complete understand- 
ing of its complexity. Our approach is based on the quan- 
tum adversary method, introduced by Ambainis pj to prove 
lower bounds on quantum query complexity. Surprisingly, 
our approach yields new and simpler lower bounds for the 
problem's classical randomized query complexity, in addi- 
tion to quantum lower bounds. Thus, along with recent 
work by Kerenidis and de Wolf |17| and by Aharonov and 
Regev this paper illustrates how quantum ideas can help 
to resolve classical open problems. 

Our results are as follows. For the Boolean hypercube 
G — {0,1}", we show that any quantum algorithm needs 

Q, ^2 n,/4 /nj queries to find a local minimum on G, and any 
randomized algorithm needs SI ^2 U//2 /n 2 ^ queries (improv- 
ing the 2 n / 2 ~°( n ) lower bound of Aldous [3]). Our proofs are 
elementary and do not require random walk analysis. By 
comparison, the best known upper bounds are O ^2 n / 3 n 1//6 J 

for a quantum algorithm and O f 2 n/,2 -y/nJ for a randomized 

algorithm. If G is a d- dimensional grid of size N 1 ^ x • • • x 
N 1 ^, where d > 3 is a constant, then we show that any 

quantum algorithm needs 17 {^/ iV 1 / 2-1 /" 1 / log ~Nj queries to 

find a local minimum on G, and any randomized algorithm 

needs Q (N 1/2 ~ 1/d / log N^j queries. No nontrivial lower 

bounds (randomized or quantum) were previously known in 
this case. 4 

In an earlier version of this paper, we raised as our "most 
ambitious" conjecture that the deterministic and quantum 
query complexities of local search are polynomially related 
for every family of graphs. At the time, it was not even 
known whether deterministic and randomized query com- 
plexities were polynomially related, not even for simple ex- 
amples such as the 2-dimensional square grid. Recently 
Santha and Szegedy |22| spectacularly resolved our con- 

3 Independently and much later, Droste et al. |13| showed 
the weaker bound 2 9 ' n ' for any g (n) = o (n). 
4 A lower bound on d eterministic query complexity is known 
for such graphs I18| . 



jecture, by showing that the quantum query complexity is 
at least the 19 th root (!) of the deterministic complexity. 
Given that their result generalizes ours to such an extent, 
we feel obligated to defend why this paper is still relevant. 
First, for specific graphs such as the hypercube, our lower 
bounds are close to tight; those of Santha and Szegedy are 
not. Second, we give randomized lower bounds that are 
quadratically better than our quantum lower bounds; San- 
tha and Szegedy give only quantum lower bounds. 

In another recent development, Ambainis (personal com- 
munication) has improved our Q. ^2"^ 4 /n^ quantum lower 

bound for local search on the hypercube to 2"^ 3 /n°^\ us- 
ing a hybrid argument. Note that Ambainis' lower bound 
matches the upper bound up to a polynomial factor. 

The paper is organized as follows. Section [5] motivates 
lower bounds on LOCAL Search, pointing out connections 
to simulated annealing, quantum adiabatic algorithms, and 
the complexity class TFNP of total function problems. Sec- 
tion ^defines notation and reviews basic facts about Local 
Search, including upper bounds. In Section [I] we give 
an intuitive explanation of Ambainis's quantum adversary 
method, then state and prove a classical analogue of Am- 
bainis's main lower bound theorem. Section introduces 
snakes, a construction by which we apply the two adver- 
sary methods to Local Search. We show there that to 
prove lower bounds for any graph G, it suffices to upper- 
bound a combinatorial parameter e of a 'snake distribution' 
on G. SectionEJapplies this framework to specific examples 
of graphs: the Boolean hypercube in Section 16. f I and the 
d-dimensional grid in Section lo"2*l 

2. MOTIVATION 

Local search is the most effective weapon ever devised against 
hard optimization problems. For many real applications, 
neither backtrack search, nor approximation algorithms, nor 
even Grover's algorithm (assuming we had a quantum com- 
puter) can compare. Furthermore, along with quantum 
computing, local search (broadly defined) is one of the most 
interesting links between computer science and Nature. It 
is related to evolutionary biology via genetic algorithms, and 
to the physics of materials via simulated annealing. Thus 
it is both practically and scientifically important to under- 
stand its performance. 

The conventional wisdom is that, although local search per- 
forms well in practice, its central (indeed defining) flaw is a 
tendency to get stuck at local optima. If this were correct, 
one corollary would be that the reason local search performs 
so well is that the problem it really solves — finding a local 
optimum — is intrinsically easy. It would thus be unneces- 
sary to seek further explanations for its performance. An- 
other corollary would be that, for unimodal functions (which 
have no local optima besides the global optimum) , the global 
optimum would be easily found. 

However, the conventional wisdom is false. The results of 
Llewellyn et al. [19) and Aldous show that even if / is 
unimodal, any classical algorithm that treats / as a black 
box needs exponential time to find the global minimum of / 
in general. Our results extend this conclusion to quantum 



algorithms. In our view, the practical upshot of these results 
is that they force us to confront the question: What is it 
about 'real-world' problems that makes it easy to find a 
local optimum? That is, why do exponentially long chains 
of descending values, such as those used for lower bounds, 
almost never occur in practice (even in functions with large 
range sizes)? We do not know a good answer to this. 

Our results are also relevant for physics. Many physical 
systems, including folding proteins and networks of springs 
and pulleys, can be understood as performing 'local search' 
through an energy landscape to reach a locally-minimal en- 
ergy configuration. A key question is, how long will the 
system take to reach its ground state (that is, a globally- 
minimal configuration)? Of course, if there are local op- 
tima, the system might never reach its ground state, just 
as a rock in a mountain crevice does not roll to the bot- 
tom by going up first. But what if the energy landscape is 
unimodal? And moreover, what if the physical system is 
quantum? Our results show that, for certain energy land- 
scapes, even a quantum system would take exponential time 
to reach its ground state, regardless of what Hamiltonian is 
applied to it. So in particular, the quantum adiabatic algo- 
rithm proposed by Farhi et al. |15|. which can be seen as a 
quantum analogue of simulated annealing, needs exponen- 
tial time to find a local minimum in the worst case. 

Finally, our results have implications for so-called total func- 
tion problems in complexity theory. Megiddo and Papadim- 
itriou |20| defined a complexity class 5 TFNP, consisting (in- 
formally) of those NP search problems for which a solution 
always exists. For example, we might be given a function 
/ : {0, 1}™ — > {0, l}™ -1 as a Boolean circuit, and asked to 
find any distinct x, y pair such that / (a;) = / (y). This par- 
ticular problem belongs to a subclass of TFNP called PPP 
(Polynomial Pigeonhole Principle). Notice that no promise 
is involved: the combinatorial nature of the problem itself 
forces a solution to exist, even if we have no idea how to 
find it. In a recent talk, Papadimitriou 1211 asked broadly 
whether such 'nonconstructive existence problems' might be 
good candidates for efficient quantum algorithms. In the 
case of PPP problems, the collision lower bound of Aaronson 
pQ (improved by Shi ("25} and others) implies a negative an- 
swer in the black-box setting. For other subclasses of TFNP, 
such as PODN (Polynomial Odd-Degree Node), a quantum 
black-box lower bound follows easily from the optimality of 
Grover's search algorithm. 

However, there is one important subclass of TFNP for which 
no quantum lower bound was previously known. This is 
PLS (Polynomial Local Search), defined by Johnson, Pa- 
padimitriou, and Yannakakis |16| as a class of optimization 
problems whose cost function / and neighborhood function 
n (that is, the set of neighbors of a given point) are both 
computable in polynomial time. Given such a problem, 
the task is to output any local minimum of the cost func- 
tion: that is, a v such that f (v) < f (w) for all w £ n (v). 
The lower bound of Llewellyn et al. |19| yields an oracle A 
relative to which FP A 7^ PLS A , by a standard diagonaliza- 
tion argument along the lines of Baker, Gill, and Solovay 
\7\. Likewise, the lower bound of Aldous 3 yields an oracle 

5 See www.cs.berkeley.edu/~aaronson/zoo.html for details 
about the complexity classes mentioned in this paper. 



relative to which PLS <t FBPP, where FBPP is simply the 
function version of BPP. Our results yield the first ora- 
cle relative to which PLS FBQP. In light of this oracle 
separation, we raise an admittedly vague question: is there 
a nontrivial "combinatorial" subclass of TFNP that we can 
show is contained in FBQP? 

3. PRELIMINARIES 

In the Local Search problem, we are given an undirected 
graph G = (V, E) with N — \V\, and oracle access to a 
function / : V — ► N. The goal is to find any local minimum 
of /, defined as a vertex v € V such that f (v) < f (w) for 
all neighbors w of v. Clearly such a local minimum exists. 
We want to find one using as few queries as possible, where a 
query returns / (v) given v. Queries can be adaptive; that 
is, can depend on the outcomes of previous queries. We 
assume G is known in advance, so that only / needs to be 
queried. Since we care only about query complexity, not 
computation time, there is no difficulty in dealing with an 
infinite range for / — though for our lower bounds, it will 
turn out that a range of size O {\V\) suffices. 

Our model of query algorithms is the standard one; see 
for a survey. Given a graph G, the deterministic query com- 
plexity of Local Search on G, which we denote DLS (G), 
is minr max/ T (F, /, G) where the minimum ranges over all 
deterministic algorithms F, the maximum ranges over all /, 
and T (F, /, G) is the number of queries made to / by V be- 
fore it halts and outputs a local minimum of / (or 00 if T 
fails to do so). The randomized query complexity RLS (G) 
is defined similarly, except that now the algorithm has access 
to an infinite random string 7?, and must only output a local 
minimum with probability at least 2/3 over R. For simplic- 
ity, we assume that the number of queries T is the same for 
all R; clearly this assumption changes the complexity by at 
most a constant factor. 

In the quantum model, an algorithm's state has the form 
z s a v,z,s \v, z, s), where v is the label of a vertex in G, 
and z and s are strings representing the answer register and 
workspace respectively. The a„, ZlS 's are complex ampli- 
tudes satisfying Yl v \a v ,z,a\ 2 = 1. Starting from an ar- 
bitrary (fixed) initial state, the algorithm proceeds by an 
alternating sequence of queries and algorithm steps. A 
query maps each \v,z,s) to \v,z ffi / (v) , s), where © de- 
notes bitwise exclusive-OR. An algorithm step multiplies 
the vector of a V]Zt s's by an arbitrary unitary matrix that 
does not depend on /. Letting A4 / denote the set of 
local minima of /, the algorithm succeeds if at the end 
J2v,z, s -. v£M f l Q «>z.s| 2 ^ §■ Then the bounded-error quan- 
tum query complexity, or QLS(G), is defined as the mini- 
mum number of queries used by a quantum algorithm that 
succeeds on every /. 

It is immediate that QLS (G) < RLS (G) < DLS (G) < N. 
Also, letting 5 be the maximum degree of G, we have the 
following trivial lower bound. 

Proposition 1. RLS (G) = fi (<$) and QLS (G) = fi (ytj 

Proof. Let v be a vertex of G with degree 8. Choose a 
neighbor w of v uniformly at random, and let / (w) = 1. Let 



/ (v) = 2, and / (u) = 3 for all neighbors u of v other than 
w. Let 5* be the neighbor set of v (including v itself); then 
for all x <£ S, let / (x) = 3 + A (x, S) where A (a;, S) is the 
minimum distance from x to a vertex in S. Clearly / has 
a unique local minimum at w. However, finding y requires 
exhaustive search among the S neighbors of v, which requires 
Q. (y^) quantum queries UJ. □ 

A corollary of Proposition Q is that classically, zero-error 
randomized query complexity is equivalent to bounded-error 
up to a constant factor. For given a candidate local mini- 
mum v, one can check using O (8) queries that v is indeed 
a local minimum. Since Q (S) queries are needed anyway, 
this verification step does not affect the overall complexity. 

As pointed out by Aldous a classical randomized algo- 
rithm can find a local minimum of / with high probability 

in O (^/ Whj queries. The algorithm just queries V NS ver- 
tices uniformly at random, and lets do be a queried vertex 
for which / (v) is minimal. It then follows vo to a local 
minimum by steepest descent. That is, for t = 0, 1, 2, . . ., 
it queries all neighbors of v t , halts if v t is a local minimum, 
and otherwise sets Vt+i to be the neighbor w of Vt for which 
/ (w) is minimal (breaking ties by lexicographic ordering) . 
A similar idea yields an improved quantum upper bound. 

Proposition 2. For any G, QLS (G) = O (iV 1/3 (5 1/6 ) . 

Proof. The algorithm first chooses jV 2 / 3 ^ 3 vertices of 
G uniformly at random, then uses Grover search to find a 
chosen vertex vo for which / (v) is minimal. By a result of 
Diirr and H0yer |14|. this can be done with high probabil- 
ity in O (iV 1/3 5 1/6 ) queries. Next, for t = 0, 1, 2, . . ., the 
algorithm performs Grover search over all neighbors of v±, 
looking for a neighbor w such that / (w) < f (vt)- If it finds 
such a w, then it sets Vt+i := w and continues to the next 
iteration. Otherwise, it repeats the Grover search log (N/5) 
times before finally giving up and returning t>t as a claimed 
local minimum. 

The expected number of vertices u such that / (it) < / (vq) is 
at most N/ (iV 2 '' 3 5 1/3 ) = (N/S) 1/3 . Since / (v t+ i) < f (v t ) 
for all t, clearly the number of such u provides an upper 
bound on t. Furthermore, assuming there exists a w such 
that / (it;) < f (vt), the expected number of repetitions of 
Grover's algorithm until such a to is found is O (1). Since 

each repetition takes O (V<5j queries, by linearity of ex- 
pectation the total expected number of queries used by the 
algorithm is therefore 

O (V /3 <5 1/6 + (N/S) 1/:i VS + log (N/5) Vt) 

or O (^N 1/s 5 1/6 y To see that the algorithm finds a local 
minimum with high probability, observe that for each t, the 
probability of not finding a w such that / (w) < f (vt), given 
that one exists, is at most c ~ 1ob{n/s) < (S/N) 1/3 /10 for a 
suitable constant c. So by the union bound, the probabil- 
ity that the algorithm returns a 'false positive' is at most 
(N/5) 1/3 ■ (S/N) 1/3 /10 = 1/10. □ 



4. RELATIONAL ADVERSARY METHOD 

We know of essentially two methods for proving lower bounds 
on quantum query complexity: the polynomial method of 
Beals et al. [8], and the quantum adversary method of Am- 
bainis 5 . 6 For a few problems, such as the collision problem 
PP, the polynomial method succeeded where the adversary 
method failed. However, for problems that lack permu- 
tation symmetry (such as Local Search), the adversary 
method has proven more effective. 7 

How could a quantum lower bound method possibly be ap- 
plied classically? When proving randomized lower bounds, 
the tendency is to attack "bare-handed": fix a distribution 
over inputs, and let xi, . . . , x t be the locations queried so far 
by the algorithm. Show that for small t, the posterior distri- 
bution over inputs, conditioned on xi,...,Xt, is still 'hard' 
with high probability — so that the algorithm knows almost 
nothing even about which location xt+i to query next. This 
is essentially the approach taken by Aldous |3] to prove a 

2 n/2-o(u) lower bound Qn RLg ({ Q) 

In the quantum case, however, it is unclear how to specify 
what an algorithm 'knows' after a given number of queries. 
So we are almost forced to step back, and identify general 
combinatorial properties of input sets that make them hard 
to distinguish. Once we have such properties, we can then 
try to exhibit them in functions of interest. 

We believe this "gloved" attack can be useful for classical 
lower bounds as well as quantum ones. In our relational 
adversary method, we assume there exists a T-query ran- 
domized algorithm for function F. We consider a set A of 
0-inputs of F, a set B of 1-inputs, and an arbitrary real- 
valued relation function R (A, B) > for ^4 £ .4 and B 6 B. 
Intuitively, R (A, B) should be large if A and B differ in 
only a few locations. We then fix a probability distribution 
T> over inputs; by Yao's minimax principle, there exists a 
T-query deterministic algorithm F* that succeeds with high 
probability on inputs drawn from T>. Let Wa be the set of 
0-inputs and Wb the set of 1-inputs on which F* succeeds. 
Using the relation function R, we define a separation mea- 
sure S between Wa and Wb, and show that (1) initially 
S = 0, (2) by the end of the computation S must be large, 
and (3) S increases by only a small amount as the result of 
each query. It follows that T must be large. 

Undoubtedly any randomized lower bound proved using our 
relational method could also be proved "bare-handed," with- 
out any quantum intuition. However, our method makes 
it easier to focus on what is unique about a problem, and 
ignore what is common among many problems. 

Our starting point is the "most general" adversary theorem 
in Ambainis's original paper (Theorem 6 in 5 ), which he 
introduced to prove a quantum lower bound for the problem 
of inverting a permutation. Here the input is a permutation 
a (1) , . . . , a (N), and the task is to output if a^ 1 (1) < 
N/2 and 1 otherwise. To lower-bound this problem's query 

6 We are thinking here of the hybrid method |5] as a cousin 
of the adversary method. 

7 Indeed, Ambainis has given problems for which the ad- 
versary method provably yields a better lower bound than 
the polynomial method. 



complexity, what we would like to say is this: 

Given any 0-input a and any location x, if we choose a 
random 1-input r that is 'related' to a, then the probability 
8 (a, x) over r that a (x) does not equal r (x) is small. In 
other words, the algorithm is unlikely to distinguish a from 
a random neighbor r of a by querying x. 

Unfortunately, the above claim is false. Letting x = a^ 1 (1), 
we have that a (x) 7^ r (x) for every 1-input r, and thus 
8(a,x) — 1. Ambainis resolves this difficulty by letting 
us take the maximum, over all 0-inputs a and 1-inputs t 
that are related and differ at x, of the geometric mean 
\J 8 (a, x) 8 (r, x). Even if (a, x) = 1, the geometric mean 
is still small provided that 8 (r, x) is small. More formally: 

Theorem 3 (Ambainis). Let A C F^ 1 (0) and B C 
F^ 1 (1) be sets of inputs to function F. Let R (A, B) > be 
a real-valued function, and for A £ A, B £ B, and location 
x, let 

a , . s _ E B »eB : A(x)^B*(x) R ( A ,B*) 

(A ' X) J2 B . eB R{A,B-) ' 

, , T,A>£A : A>(x)^B(x) R ( A * ,B) 

e{B > x) = Y^RlA^B) • 

where the denominators are all nonzero. Then the number 
of quantum queries needed to evaluate F with at least 9/10 
probability is £7 ( 1 /u gC om ) , where 

fgeom = max \J8 (A,x) 8 (B, x). 

A€A, B£B, x : 
R(A,B)>0, A(x)^B(x) 

To illustrate we show the following. 

Proposition 4 (Ambainis). The quantum query com- 
plexity of inverting a permutation is Q \^/Nj . 

Proof. Let A be the set of all permutations a with a -1 (1) 
< N/2, and B be the set of permutations r with r" 1 (1) > 
N/2. Given a £ A and r G B, let R (a, t) = 1 if cr and r 
differ only at locations a^ 1 (1) and r _1 (1), and R (a, t) — 
otherwise. Then given a, r with R (a, t) = 1, if x ^ a' 1 (1) 
then 8 (a, x) = 2/N, and if x r" 1 (1) then 8 (r, x) = 2/N. 
So maxz . a{x )^ T ( x ) ^/8(o-,x)8(t,x) = \/2/N. □ 

The only difference between Theorem |S| and our relational 
adversary theorem is that in the latter, we take the mini- 
mum of 8 [A, x) and 8 (B, x) instead of the geometric mean. 
Taking the reciprocal then gives up to a quadratically better 
lower bound: for example, we obtain that the randomized 
query complexity of inverting a permutation is fl (N) . How- 
ever, the proofs of the two theorems are quite different. 

Theorem 5. Let A,B,R,8 be as in Theorem^ Then 
the number of randomized queries needed to evaluate F with 
at least 9/10 probability is f2 (l/?; m i n ), where 

t) m in = max min {8 (A, x) , 8 (B, x)} . 

AeA, BGB, x : 
R(A,B)>0, A(x) 1 tB{x) 



Proof. Let T be a randomized algorithm that, given an 
input A, returns F (A) with at least 9/10 probability. Let 
T be the number of queries made by F. For all A £ A, 
B G 23, define 

M(A)= R(A,B*), 

B«6B 

M(B) = R(A*,B), 

A*GA 

M= Y, M(A*)= Y M (B*)- 

A*£A B*eB 

Now let T>a be the distribution over A G A in which each 
A is chosen with probability M (A) /M; and let T>b be the 
distribution over B G B in which each B is chosen with 
probability M (B) /M. Let T> be an equal mixture of Da 
and T>b- By Yao's minimax principle, there exists a deter- 
ministic algorithm T* that makes T queries, and succeeds 
with at least 9/10 probability given an input drawn from T>. 
Therefore F* succeeds with at least 4/5 probability given an 
input drawn from T>a alone, or from T>b alone. In other 
words, letting Wa be the set of A £ A and Wb the set of 
B G B on which F* succeeds, we have 

Y M(A)>^M, Y M(B)>^M. 
A&w A ' Bew B 

Define a predicate P'*' (A,B), which is true if T* has dis- 
tinguished A £ A from B £ B by the t th query and false 
otherwise. (To distinguish A from B means to query an 
index x for which A(x) 7^ B(x), given either A or B as 
input.) Also, for all A £ A, define a score function 

S (t) (A)= Y R{A,B"). 

B'£B : P(*)(A,B») 

This function measures how much "progress" has been made 
so far in separating A from B- inputs, where the Z3-inputs are 
weighted by R(A, B). Similarly, for all B £ B define 

S {t) (B)= Y R(A*,B). 

A-eA : PW{A*,B) 

It is clear that for all t, 

Ys^(A) = Y s(t} ( B )- 

A£A BeB 

So we can denote the above sum by and think of it as a 
global progress measure. We will show the following about 

(i) S m = initially. 

(ii) S m > 3M/5 by the end. 

(iii) AS (t) < 3u min M for all t, where AS (t) = S (t) - S^" 1 ' 
is the amount by which S 1 '*' increases as the result of 
a single query. 

It follows from (i)-(iii) that 

T> 3M/5 = 1 

~ 3v rnin M 5Wmin 

which establishes the theorem. Part (i) is obvious. For 
part (ii), observe that for every pair (A,B) with A £ Wa 



and B g Wb, the algorithm V* must query an x such that 
A(x)^B(x). Thus 

5 ,(T) > ^ 7?(A,B) 
Aew A , Bew B 

> M ( A )~ E M(B)>gM-|M. 

A£W A B^W B 

It remains only to show part (iii). Suppose AS'*' > 3u m inM 
for some t; we will obtain a contradiction. Let 

A5 (t) (A) = S (t) (A) - S (t_1) (A) , 

and let C A be the set of A g A for which A5" (t) (A) > 
v mia M (A). Since 

^ AS (t) (A) = AS (t> > SVminM, 

it follows by Markov's inequality that 

AS W (A) > -AS (t) . 

Aec A 

Similarly, letting C B be the set of B g B for which AS (t) (B) > 
UminM (B), we have 

£ A5 (t) (B)> ^AS (t) . 

S6C B 

In other words, at least 2/3 of the increase in comes 
from (A, B) pairs such that A g Ca, and at least 2/3 comes 
from (A,B) pairs such that B g Cb- Hence, by a 'pigeon- 
hole' argument, there exists an A g Ca and B g Cb with 
i? (j4, B) > that are distinguished by the t" 1 query. In 
other words, there exists an x with A (x) 7^ B (x), such that 
the t* index queried by F* is a; whether the input is A or 
B. Then since A g Ca, we have v min M(A) < AS (t) (A), 
and hence 

AS W (A) Ejj. e8 : A(xW(x)R( A > B *) 

Vmln< M (A) ~ E B . e8 fl(A5*) 

which equals 6(A,x). Similarly Vmin < 0(B,x) since B g 
Cb- This contradicts the definition 

Vmi n — max min \8 (A, x) , 8 (B, x)\ , 

AGA, B£B, x : 
B(A,S)>0, A(a;)^B(x) 

and we are done. □ 

5. SNAKES 

For our lower bounds, it will be convenient to generalize 
random walks to arbitrary distributions over paths, which 
we call snakes. 

Definition 6. Given a vertex h in G and a positive in- 
teger L, a snake distribution T>h,L (parameterized by h and 
L) is a probability distribution over paths (xo, ■ ■ ■ ,xl-i) in 
G, such that each xt is either equal or adjacent to Xt+i, and 
xl-i = h. Let Dh.L be the support of Dh.L- Then an ele- 
ment of Dh.L is called a snake; the part near xo is the tail 
and the part near xl-i — h is the head. 

Given a snake X and integer t, we use X [t] as shorthand 
for {x , ■ ■ ■ , x t }. 



Definition 7. We say a snake X g Dh.L is e-good if 
the following holds. Choose j uniformly at random from 
{0, . . . , L — 1}, and let Y = (3/0, • ■ • , Vl-i) be a snake drawn 
from T>h,L conditioned on x t = yt for all t > j. Then 

(i) Letting Sx,y be the set of vertices v in X<~\Y such that 
min {t : x t = v} = min {t : y t — v} , we have 

Pr [X n Y = Sx,y] > 9/10. 
(ii) For all vertices v, Ptj,y [v g Y [j]] < e. 



The procedure above — wherein we choose a j uniformly at 
random, then draw a Y from T>h,L consistent with X on all 
steps later than j — will be important in what follows. We 
call it the snake X flicking its tail. Intuitively, a snake is 
good if it is spread out fairly evenly in G — so that when it 
flicks its tail, (1) with high probability the old and new tails 
do not intersect, and (2) any particular vertex is hit by the 
new tail with probability at most e. 

We now explain our 'snake method' for proving lower bounds 
for Local Search. Given a snake X, we define an input 
fx with a unique local minimum at xo, and /-values that 
decrease along X from head to tail. Then, given inputs fx 
and fy with X n Y — Sx,y, we let the relation function 
R(fx , fy) be proportional to the probability that snake Y 
is obtained by X flicking its tail. (If X n Y 7^ Sx,y we let 
R = 0.) Let fx and gy be inputs with R(fx,gy) > 0, 
and let v be a vertex such that fx (v) 7^ gy (v). Then if 
all snakes were good, there would be two mutually exclusive 
cases: (1) v belongs to the tail of X, or (2) v belongs to the 
tail of Y. In case (1), v is hit with small probability when Y 
flicks its tail, so 9 (fy,v) is small. In case (2), v is hit with 
small probability when X flicks its tail, so 8 (fx, v) is small. 
In either case, then, the geometric mean v/ (fx , v) 6 (/y , v) 
and minimum min {8 (fx, v) , 8 (fy, v)} are small. So even 
though 8 (fx,v) or 8 (fy,v) could be large individually, The- 
orems [3 and |S] yield a good lower bound, as in the case of 
inverting a permutation (see Figure 1). 

One difficulty is that not all snakes are good; at best, a large 
fraction of them are. We could try deleting all inputs fx 
such that X is not good, but that might ruin some remaining 
inputs, which would then have fewer neighbors. So we 
would have to delete those inputs as well, and so on ad 
infinitum. What we need is basically a way to replace "all 
inputs" by "most inputs" in Theorems |H] and |5] 

Fortunately, a simple graph-theoretic lemma can accomplish 
this. The lemma (see Diestel |12l p. 6] for example) says that 
any graph with average degree at least k contains an induced 
subgraph with minimum degree at least k/2. Here we prove 
a weighted analogue of the lemma. 

Lemma 8. Let p (1) , . . . ,p (m) be positive reals summing 
to 1. Also letw (i,j) fori,j g {1, . . . ,m} be nonnegative re- 
als satisfying w (i,j) = w (j,i) and ■ w (i,j) > r. Then 
there exists a nonempty subset U C {1, . . . , m} such that for 
all i g U, Ejec/ w (*'■?) ^ r P W I 2 - 



Large 8(f x ,v) 

s but small G(f Y ,v) ' j+1 




x L -i=y L -i 



Figure 1: For every vertex v such that fx (v) 7^ fy (v), 
either when snake X flicks its tail v is not hit with 
high probability, or when snake Y flicks its tail v is 
not hit with high probability. 



Proof. If r = then the lemma trivially holds, so as- 
sume r > 0. We construct U via an iterative procedure. 
Let U (0) = {1, . . . , m}. Then for all t, if there exists an 
i* G U (t) for which 

E w(i*,j) < 7p(0 > 

ieu(t) 

then set U (t + 1) — U (t) \ {«*}. Otherwise halt and return 
U — U (t). To see that the U so constructed is nonempty, 
observe that when we remove i* , the sum YlteUH) P (*) 
creases by p (i*), while jeutt) w (*j i) decreases by at most 

^ to(i*,j')+ ^ w(J,i*) <rp(i*) . 
jeu{t) jeu(t) 

So since J]\ jet/ft) w was positive to begin with, it must 
still be positive at the end of the procedure; hence U must 
be nonempty. □ 

We can now prove the main result of the section. 

Theorem 9. Suppose a snake drawn from T>h,l is e-good 
with probability at least 9/10. 

RLS (G) = n (1/e) , 



Proof. Given a snake X G Dh.l, we construct an in- 
put function fx as follows. For each v g X, let fx (v) = 
min {t : x t — v}; and for each v £ X, let fx (v) — A (u, h) + 
L where A (v, h) is the distance from v to /t in G. Clearly 
/x so denned has a unique local minimum at xq. To ob- 
tain a decision problem, we stipulate that querying xo re- 
veals an answer bit (0 or 1) in addition to fx (xi); the algo- 
rithm's goal is then to return the answer bit. Obviously a 
lower bound for the decision problem implies a correspond- 
ing lower bound for the search problem. Let us first prove 
the theorem in the case that all snakes in Dh,L are e-good. 
Let p (X) be the probability of drawing snake X from T>h,L- 
Also, given snakes X, Y and j 6 {0, . . . , L — 1}, let qj (X, Y) 




be the probability that X* — Y, if X* is drawn from D^.l 
conditioned on agreeing with X on all steps later than j. 
Then define 



w(X,Y) = 



P(X) 



2=0 



Our first claim is that w is symmetric; that is, w (X, Y) — 
w (Y, X). It suffices to show that 

p(X)q j (X,Y)=p(Y)q j (Y,X) 

for all j. We can assume X agrees with Y on all steps later 
than j, since otherwise qj (X, Y) = qj (Y, X) = 0. Given 
an X* G Dh,l, let A denote the event that X* agrees with 
X (or equivalently Y) on all steps later than j, and let Bx 
(resp. By) denote the event that X* agrees with X (resp. 
Y) on steps 1 to j. Then 

p (X) q 3 (X, Y) = Pr [A] Pr [B X \A] ■ Pr \B Y \A) 
= p(Y) qj (Y,X). 

Now let E(X,Y) denote the event that X (1 Y = Sx,y, 
where Sx,y is as in Definition^ Also, let fx be the input 
obtained from X that has answer bit 0, and gx be the input 
that has answer bit 1. To apply Theorems[3]and[H] take A = 
{fx : X S D KL } and B = {gx ■ X G D h , L }. Then take 
R{fx,gv) = w(X,Y) ]£E(X,Y) holds, and R (fx, gy) = 
otherwise. Given fx 6 A and gy G B with R (fx,gy) > 0, 
and letting v be a vertex such that fx (v) 7^ gy (v), we must 
then have either v £ X or v </ Y. Suppose the former case; 
then 

R(fx;gy) 

L-l 

< E 

fx*eA : / x .(»)^9y(») J=o 



^£ gj (Y,X*)<ep(Y) : 



since Y is e-good. Thus 



<(gy,v) 



R(fx*,gy) 



J2f x ,,= A R (fx-,9Y) 



ep{Y) 
9p(Y)/W 



Similarly, if v ^ Y then 6(fx,v) < 10e/9 by symmetry. 
Hence 

"mi- = max mm{0(f x ,v),6(g Y ,v)} < -|— , 

fx^A, gY&B, v. y/lU 
R(fx,3Y)>0, 
/x(»)#9y(") 



^gcom — max 

fx^A, g Y €B, 

R(fx,g Y )>0 
/xW^sy(») 



9/10 



the latter since 9 (fx,v) < 1 and (gy,v) < 1 for all fx,gy 
and v. We now turn to the general case, in which a snake 
drawn from T>h,l is e-good with probability at least 9/10. 
Let G (X) denote the event that X is e-good. Take A* = 
{fx G A : G (X)} and B* = {gy G B : G (Y)}, and take 
R(fx , gy) as before. Then since 



10 



by the union bound we have 

E R(fx,gr) 

fxe-A", g Y eB* 

> £ w(X,Y) 

X,Y : G(X)AG(Y)AE(X,Y) 

- E pW- E 

Jf : n G(X) y : n G(y) 

> _9___L 1_ _ _7_ 

- 10 10 10 ~~ 10' 

So by Lemma |HJ there exist subsets A C .4* and S C S* 
such that for all fx & A and gy £ B, 

7p(X) 



g Y ,GB 



20 

20 ' 



So for all fx,gy with R(fx,gY) > 0, and all w such that 
fx{v) / ffr(v), either 8(fx,v) < 20e/7 or 6{g Y ,v) < 
20e/7. Hence u mln < 20e/7 and % om < ^20ej7. □ 



6. SPECIFIC GRAPHS 

In this section we apply the 'snake method' developed in 
Section [3] to specific examples of graphs: the Boolean hy- 
percube in Section lb. II and the d-dimensional cubic grid 
(for d > 3) in Section IO 

6.1 Boolean Hypercube 

Abusing notation, we let {0, 1}" denote the n-dimensional 
Boolean hypercube — that is, the graph whose vertices are 
n-bit strings, with two vertices adjacent if and only if they 
have Hamming distance 1. Given a vertex v G {0,1}™, 
we let v [0] , . . . , v [n — 1] denote the n bits of v, and let u'*' 
denote the neighbor obtained by flipping bit v [i] . In this 
section we lower-bound RLS ({0, l} n ) and QLS ({0, 1}™). 

Fix a 'snake head' h G {0, 1}" and take L = 2 n/2 /100. 
We define the snake distribution T>k,l via what we call a 
coordinate loop, as follows. Starting from xq = h, for each 
t take xt+i = x t with 1/2 probability, and xt+i = a; ( tmodn ) 
with 1/2 probability. The following is a basic fact about 
this distribution. 



Proposition 10. The coordinate loop mixes completely 
in n steps, in the sense that if t* > t + n, then Xt* is a 
uniform random vertex conditioned on Xt- 

We could also use the random walk distribution, following 
Aldous 3 . However, not only is the coordinate loop dis- 
tribution easier to work with (since it produces fewer self- 
intersections), it also yields a better lower bound (since it 
mixes completely in n steps, as opposed to approximately 
in nlogn steps). 

We first upper-bound the probability, over X, j, and Y [j], 
that In F/ Sx,y (where Sx,y is as in Definition 0. 



Lemma 11. Suppose X is drawn from T>h,l, j is drawn 
uniformly from {0, L — 1}, andY [j] is drawn from T> Xj j . 
Then Pixj.YW f X n Y = Sx ^ - °- 9999 - 

Proof. Call a disagreement a vertex v such that 

min {t : x t — v} min {t* : yt* = v} . 

Clearly if there are no disagreements then X n Y — Sx,y- 
If v is a disagreement, then by the definition of T>h,l we 
cannot have both t > j — n and t* > j — n. So by Propo- 
sition^] either y t * is uniformly random conditioned on X, 
or xt is uniformly random conditioned on Y [j] . Hence 
^ >r x,j,Y[j] [ x t = yt*] = 1/2™. So by the union bound, 



Pr [X n Y j= S x ,y] < — 



0.0001. 



□ 



We now argue that, unless X spends a 'pathological' amount 
of time in one part of the hypercube, the probability of any 
vertex v being hit when X flicks its tail is small. To prove 
this, we define a notion of sparseness, and then show that (1) 
almost all snakes drawn from T>h,L are sparse (Lemma ll31 . 
and (2) sparse snakes are unlikely to hit any given vertex v 
f Lemma 1141 . 

Definition 12. Given vertices v,w andi G {0, ... ,n — 1}, 
let A (x, v, i) be the number of steps needed to reach v from x 
by first setting x [i] := v [i], then setting x [i — 1] := v [i — 1], 
and so on. (After we set x [0] we wrap around to x[n — 1].) 
Then X is sparse if there exists a constant c such that for 
all v G {0, l} n and all k, 

\{t : A (ait, v, t mod n) = k}\ < cn + J^_ k j ■ 



Lemma 13. If X is drawn from T>h,l, then X is sparse 
with probability 1 — o (1). 

PROOF. For each i G {0, . . . , n — 1}, the number of t G 
{0, . . . , L — 1} such that t = i (modn) is at most L/n. For 
such a t, let E[ v ' x,k ^ be the event that A (xt,v, i) < k; then 
El v ' 1 ' k ' holds if and only if 

x t [i] = v [i] , . . . , xt [i — k + 1] = v [i — k + 1] 

(where we wrap around to xt [n — 1] after reaching x t [0]). 
This occurs with probability 2 fc /2 n over X. Furthermore, 
by Proposition 1101 the ^" ,1 ' fc ' events for different t's are 
independent. So let 

L 2 k 
n 2 n 

then for fixed v, i, k, the expected number of t's for which 
£{v,t,k) j s a -(- mos t ^ fc> Thus by a Chernoff bound, if 

/ife > 1 then 

„cn — 1 \ Mfc 



Pr 

x 



Pr 

x 



[\{t:E^' k) }\ : r„-i„ | < 
iciently large 

r|| t: ^(«.*.*)\ 



< 



i 



{cn) cn J " 2 2 ™ 
for sufficiently large c. Similarly, if fit < 1 then 

cn/n fc -l 



> cn < 



(cn/fik) 



2 2n 



for sufficiently large c. By the union bound, then, 

\{t:E^ k) }\ <cn-(l + flh )=c(n+^j 

for every v, i, k triple simultaneously with probability at 
least 1 — n 2 2 n /2 2 ™ = 1 — o (1). Summing over all i's produces 
the additional factor of n. □ 



Lemma 14. If X is sparse, then for every v 6 {0, 1}™, 



Pr [veY[j]]=0 

J,Y \ L 




Proof. By assumption, for every k G {0, . . . , n}, 

„ r . . , . Ijf : A (xt, v, t mod n) — k} I 
Pr [A (ay , w, j mod n) = fe < ^ v ' ' * ^ 

cn / L 

Consider the probability that v £ Y [j] in the event that 
A (xj ,v,j mod n) = fc. Clearly 

Pr [v G {%_„+!, . . .,Vj}\ = ^. 

Also, Proposition I1UI implies that for every t < j — n, the 
probability that y t = v is 2~ n . So by the union bound, 

Pi> G {2/o,.-.,J/j-n}] < — . 



Then Pr^y [v € Y [j]] equals 



E 



Pr,- [A (xj ,v,j mod n) = k] ■ 
Pry [u G y [j] | A (xj ,v, j modn) = fc] 



E 



cn j L 

T \ n + 1 F lk ) \ 2 k 2™ 



as can be verified by breaking the sum into cases and doing 
some manipulations. □ 

The main result follows easily: 

Theorem 15. 

RLS ({0, 1}") =Q[±—), QLS ({0, 1}") = ( 



PROOF. Take s = n 2 /2 n/2 . Then by Theorem M it suf- 
fices to show that a snake X drawn from T>h,L is O (e)-good 
with probability at least 9/10. First, since 

Pr [X n Y = S x y] > 0.9999 
by Lemma HT1 Markov's inequality shows that 



Pr 

x 



Pr [X n Y = Sx.y] > ttt 
10 



19 
20' 



Second, by Lemma lT!Tl X is sparse with probability 1 — o(l) 
and by Lemma H4l if X is sparse then 



Pr [v G Y [j]. 



Olj-)=0(e) 



for every v. So both requirements of Definition hold si- 
multaneously with probability at least 9/10. □ 



Figure 2: In d — 3 dimensions, a snake drawn from 
T>h,L moves a random distance left or right, then a 
random distance up or down, then a random dis- 
tance inward or outward, etc. 



6.2 Constant-Dimensional Grid Graph 

In the Boolean hypercube case, we defined T>h,l by a 'co- 
ordinate loop' instead of the usual random walk mainly for 
convenience. When we move to the d- dimensional grid, 
though, the drawbacks of random walks become more seri- 
ous: first, the mixing time is too long, and second, there are 
too many self-intersections, particularly if d < 4. Our snake 
distribution will instead use straight lines of randomly cho- 
sen lengths attached at the endpoints, as in Figure 2. Let 
Gd,N be a d-dimensional grid graph with d > 3. That is, 
Gd,N has N vertices of the form v — (y [0] , . . . , v [d — 1]), 

where each v [i] is in jl, . . . , iV 1//d j (we assume for simplic- 
ity that N is a d th power). Vertices v and w are adjacent if 
and only if \v [i] — w [i]\ — 1 for some i G {0, . . . , d — 1}, and 
v\j] = w [j] for all j 7^ i (so Gd,N does not wrap around at 
the boundaries). 



We take L — VN /100, and define the snake distribution 
T>h,L as follows. Starting from xo = h, for each T we take 
x N 1 / d (T+\) identical to x N i/ dT , but with the (Tmodd) th 
coordinate x N i/d^ T+1 - ) [Tmodd] replaced by a uniform ran- 
dom value in j 1, . . . , 7V 1//d | . We then take the vertices 
x N i/d T+1 , ■ ■ ■ ,x N i/d T+N i/d_ 1 to lie along the shortest path 



from x 



N 1 / d T 



to x N i/d( T+1 \, 'stalling' at x N i/di T +i) once 
that vertex has been reached. We call 

$T — {x N l/d T , ■ . . ,X N i/d T+N l/d_ 1 ) 

a line of vertices, whose direction is Tmod rf. As in the 
Boolean hypercube case, we have: 

Proposition 16. T> h<L mixes completely in dN 1 ^ steps, 
in the sense that if T* > T + d, then x N i/d T * is a uniform 
random vertex conditioned on x N i/d T - 

Lemma 1111 in Section 16.11 goes through essentially without 
change. 

Definition 17. Letting A (x, v, i) be as before, we say X 
is sparse if there exists a constant c (possibly dependent on 



d) such that for all vertices v and all k, 

\{t:A (x t ,v, ^/7V 1/d J modd) 



= k 



< (clog AT) I 7V 1/d + 



L 



Lemma 18. If X is drawn from T>h,l, then X is sparse 
with probability 1 — o (1). 

Lemma 19. If X is sparse, then for every v £ Gd.N , 

'N 1/d losN s 



Pr[v GY[j]] = 

J,Y \ L 

where the big-O hides a constant dependent on d. 

The proofs of Lemmas 1181 and 1191 are omitted from this ab- 
stract, since they involve no new ideas beyond those of Lem- 
mas Gl and d Taking e = (log N) /N 1/2 ~ 1/d we get, by 
the same proof as for Theorem 1151 

Theorem 20. Neglecting a constant dependent on d, for 
all d> 3 



RLS {Gd,N^) — ^ 



jl/2-l/d 

V log TV 



QLS (G d ,iv) = fi 



/ _/V"l/2-l/d 

log TV 
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